Assignment 4: Due Sunday, 16 September at 23:59PM

For help with Rmarkdown for reports, see this white paper from Carnegie Mellon University’s Department of Statistics and Data Science.

For each the seven statistical distributions we covered in the last assignment (Normal, Student’s \(t\), \(\chi ^ 2\), \(F\), Binomial, Negative Binomial, and Poisson),

  1. Generate and store a random vector of 10,000 observations, using the same parameters as the last homework:
    1. \(N(\mu = 2, \sigma ^ 2 = 5 )\),
    2. \(t_{\nu = 4}\),
    3. \(\chi^2_{\nu = 2}\),
    4. \(F_{n = 90,\ m = 12}\),
    5. \(Bin(n = 9, p = 2/3)\),
    6. \(NBin(n = 5, p = 1/2)\), and
    7. \(Pois(\lambda = 3)\).
i <- rnorm(10000, 2, sqrt(5))
ii <- rt(10000, 4, 0)
iii <- rchisq(10000, 2, 0)
iv <- rf(10000, 90, 12, 0)
v <- rbinom(10000, 9, 2/3)
vi <- rnbinom(10000, 5, 0.5)
vii <- rpois(10000, 3)
  1. Subset the first \(N = 6\) values from the vector, and of this subset
    1. calculate the 5-Number Summary,
summary(i[1:6])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.7467  1.6534  2.2234  2.9999  4.6863  5.8350
summary(ii[1:6])
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.59546 -0.08433  0.50154  1.12610  1.97938  4.15662
summary(iii[1:6])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1071  1.0867  1.5758  2.3137  3.1641  6.0063
summary(iv[1:6])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.6320  0.8816  0.9870  1.2937  1.4852  2.6583
summary(v[1:6])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.00    5.00    5.00    6.00    7.25    9.00
summary(vi[1:6])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.000   4.250   5.500   5.333   6.750   7.000
summary(vii[1:6])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.250   2.000   1.833   2.000   3.000
ii. plot the histogram of the subset, and
hist(i[1:6])

hist(ii[1:6])

hist(iii[1:6])

hist(iv[1:6])

hist(v[1:6])

hist(vi[1:6])

hist(vii[1:6])

iii. plot the estimated density of this subset.

plot(density(i[1:6]))

plot(density(ii[1:6]))

plot(density(iii[1:6]))

plot(density(iv[1:6]))

plot(density(v[1:6]))

plot(density(vi[1:6]))

plot(density(vii[1:6]))

3. Repeat Item 2 for the first \(N = 10,\ 20,\ 30,\ \text{and}\ 50\) values from the random vector you generated in Item 1. Remark on the changing behaviour as the sample size increases.

summary(i[1:10])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.8288  0.8109  1.8215  2.1669  2.8453  5.8350
summary(ii[1:10])
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -1.24151 -0.21697  0.03333  0.63274  1.10529  4.15662
summary(iii[1:10])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1071  0.9345  1.2519  1.8570  1.9656  6.0063
summary(iv[1:10])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.6320  0.9014  0.9984  1.1880  1.1658  2.6583
summary(v[1:10])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.00    5.00    6.50    6.40    7.75    9.00
summary(vi[1:10])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     2.0     4.0     5.5     5.5     7.0    10.0
summary(vii[1:10])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    1.25    2.00    2.30    2.75    5.00
hist(i[1:10])

hist(ii[1:10])

hist(iii[1:10])

hist(iv[1:10])

hist(v[1:10])

hist(vi[1:10])

hist(vii[1:10])

plot(density(i[1:10]))

plot(density(ii[1:10]))

plot(density(iii[1:10]))

plot(density(iv[1:10]))

plot(density(v[1:10]))

plot(density(vi[1:10]))

plot(density(vii[1:10]))

summary(i[1:20])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -2.3975  0.8964  2.2683  2.3167  3.2270  5.9957
summary(ii[1:20])
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -2.62665 -0.62652 -0.10927  0.08884  0.75787  4.15662
summary(iii[1:20])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1071  1.0204  1.1499  2.1036  2.2238  9.2422
summary(iv[1:20])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4575  0.7868  0.9185  1.1802  1.3745  3.1243
summary(v[1:20])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3.00    5.00    6.00    5.80    6.25    9.00
summary(vi[1:20])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    3.00    4.50    4.85    7.00   10.00
summary(vii[1:20])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    1.00    2.00    2.35    3.00    5.00
hist(i[1:20])

hist(ii[1:20])

hist(iii[1:20])

hist(iv[1:20])

hist(v[1:20])

hist(vi[1:20])

hist(vii[1:20])

plot(density(i[1:20]))

plot(density(ii[1:20]))

plot(density(iii[1:20]))

plot(density(iv[1:20]))

plot(density(v[1:20]))

plot(density(vi[1:20]))

plot(density(vii[1:20]))

summary(i[1:30])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -2.3975  0.4987  1.8351  1.8947  3.0926  5.9957
summary(ii[1:30])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -2.6267 -0.4963 -0.0600  0.1540  0.7785  4.1566
summary(iii[1:30])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0318  1.0347  1.9062  2.5207  3.4851  9.2422
summary(iv[1:30])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4575  0.6367  0.9071  1.0927  1.1658  3.1243
summary(v[1:30])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     3.0     5.0     5.0     5.6     6.0     9.0
summary(vi[1:30])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0     3.0     5.0     4.9     7.0    10.0
summary(vii[1:30])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.000   2.000   2.633   3.000  10.000
hist(i[1:30])

hist(ii[1:30])

hist(iii[1:30])

hist(iv[1:30])

hist(v[1:30])

hist(vi[1:30])

hist(vii[1:30])

plot(density(i[1:30]))

plot(density(ii[1:30]))

plot(density(iii[1:30]))

plot(density(iv[1:30]))

plot(density(v[1:30]))

plot(density(vi[1:30]))

plot(density(vii[1:30]))

summary(i[1:50])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -2.3975  0.7493  1.8376  2.0769  3.5612  6.8499
summary(ii[1:50])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -2.6267 -0.6887 -0.0600  0.1486  0.9206  4.1566
summary(iii[1:50])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0318  1.0036  1.3980  2.3129  3.4851  9.2422
summary(iv[1:50])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4289  0.7310  0.9779  1.1215  1.2746  3.1243
summary(v[1:50])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3.00    5.00    6.00    5.84    7.00    9.00
summary(vi[1:50])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    2.00    4.00    4.46    7.00   14.00
summary(vii[1:50])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    2.00    3.00    2.94    4.00   10.00
hist(i[1:50])

hist(ii[1:50])

hist(iii[1:50])

hist(iv[1:50])

hist(v[1:50])

hist(vi[1:50])

hist(vii[1:50])

plot(density(i[1:50]))

plot(density(ii[1:50]))

plot(density(iii[1:50]))

plot(density(iv[1:50]))

plot(density(v[1:50]))

plot(density(vi[1:50]))

plot(density(vii[1:50]))

# We can be infered from those density plots and histograms that the graph would become more smooth with sample size increasing.
  1. Repeat Item 2 for the entire vector (\(N = 10000\)). For smaller values of \(N\) from continuous distributions, which tool do you think gave a better representation of the full data: histogram or density plot? Did this change when you inspected the discrete distributions?
summary(i)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -6.5984  0.5064  1.9752  1.9947  3.5057 10.6039
summary(ii)
##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
## -10.842620  -0.740901  -0.018082  -0.006291   0.718828  14.593535
summary(iii)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##  0.000107  0.588600  1.414373  2.030298  2.825117 22.332680
summary(iv)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.2827  0.7869  1.0451  1.1987  1.4160 10.1421
summary(v)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    5.00    6.00    5.98    7.00    9.00
summary(vi)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   3.000   4.000   4.993   7.000  28.000
summary(vii)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   2.000   3.000   3.027   4.000  13.000
hist(i)

hist(ii)

hist(iii)

hist(iv)

hist(v)

hist(vi)

hist(vii)

plot(density(i))

plot(density(ii))

plot(density(iii))

plot(density(iv))

plot(density(v))

plot(density(vi))

plot(density(vii))

# For discrete distribution, histogram always be the good choice. But for continuous distribution, histogram is good for small size, density plot is good for for large size.
  1. Contrast the 5-Number Summaries at each of the sample sizes (6, 10, 20, 30, 50, and 10000) for the skewed distributions vs. the symmetric distributions.
# Comparing the 5-Number Summaries at each of the sample sizes for those distributions we can find that, for symmetric distribution, with the sample size increasing, the median and mean are getting closer. For skewed distribution, with the sample size increasing, the median become more far away from mean.